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Abstract. The Hardy-Littlewood method is a well-known technique in analytic number 
theory. Among its spectacular applications are Vinogradov's 1937 result that every suf- 
ficiently large odd number is a sum of three primes, and a related result of Chowla and 
Van der Corput giving an asymptotic for the number of 3-term progressions of primes, all 
less than N. This article surveys recent developments of the author and T. Tao, in which 
the Hardy-Littlewood method has been generalised to obtain, for example, an asymptotic 
for the number of 4-term arithmetic progressions of primes less than N. 
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1. Introduction 

Godfrey Harold Hardy and John Edensor Littlewood wrote, in the 1920s, a famous 
series of papers Some problems of "partitio numerorum". In these papers, whose 
content is elegantly surveyed by Vaughan they developed techniques having 
their genesis in work of Hardy and Ramanujan on the partition function to 
well-known questions in additive number theory such as Waring's problem and the 
Goldbach problem. 

Papers III and V in the series, ^3E]j were devoted to the sequence of primes. 
In particular it was established on the assumption of the Generalised Riemann 
Hypothesis that every sufficiently large odd number is the sum of three primes. In 
1937 Vinogradov \6'6\ made a further substantial advance by removing the need for 
any unproved hypothesis. 

The Hardy-Littlewood- Vinogradov method may be applied to give an asymp- 
totic count for the number of solutions in primes pi to any fixed linear equation 



a x p x + • • • + a t p t = b 



"This research was partially conducted during the period the author served as a Clay Research 
Fellow. He would like to express his sincere gratitude to the Clay Institute, and also to the 
Massachusetts Institute of Technology, where he was a Visiting Professor for the academic year 
2005-06. 
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in, say, the box pi,...,p t < AT, provided that at least 3 of the di are non-zero. 
This includes the three-primes result, and also the result that there are infinitely 
many triples of primes p\ < P2 < P3 in arithmetic progression, due to Chowla 0] 
and van der Corput (23|. 

More generally the Hardy-Littlewood method may also be used to investigate 
systems such as Ap = b, where A is an s x t matrix with integer entries and, 
potentially, s > 1. A natural example of such a system is given by the (k — 2) x k 
matrix 



A -2 1 ... 0\ 
1 -2 1 ... 

\0 ... 1 -2 lj 



(1) 



in which case a solution to Ap — is just a fc-term arithmetic progression of 
primes. 

Here, unfortunately, the Hardy-Littlewood method falters in that it generally 
requires t ^ 2s + 1. In particular it cannot be used to handle progressions of length 
four or longer. There are certain special systems with fewer variables which can be 
handled. In this context we take the opportunity to mention a beautiful result of 
Balog 2 , where it is shown that for any m there are distinct primes pi < ■ ■ ■ < p rn 
such that each number \(j>i +Pj) is also prime, or in other words that the system 

Pl+P2= 2P12 
Pm-1 + Prn = tPm-l.m (2) 

has a solution in primes p±, . . . ,p m ,pi2, ■ ■ ■ ,Pm-i,m- There is also a result of 
Heath-Brown in which it is established that there are infinitely many four- 
term progressions in which three members are prime and the fourth is either a 
prime or a product of two primes. 

The survey of Kumchev and Tolev [251 gi yes a detailed account of applications 
of the Hardy-Littlewood method to additive prime number theory. 

The aim of this survey is to give an overview of recent work of Terence Tao 
and I ^3 1141 ITS] . Our aim, which has been partially successful, is to extend the 
Hardy-Littlewood method so that it is capable of handling a more-or-less arbitrary 
system Ap = b, subject to the proviso that we do not expect to be able to handle 
any system which secretly encodes a "binary" problem such as Goldbach or Twin 
Primes. 

This is a large and somewhat technical body of work. Perhaps my main aim 
here is to give a guide to our work so far, pointing out ways in which the various 
papers fit together, and future directions we plan to take. A subsidiary aim is to 
focus as far as possible on key concepts, rather than on details. Of course, one 
would normally aim to do this in a survey article. However in our case we expect 
that many of these details will be substantially cleaned up in future incarnations 
of the theory, whilst the key concepts ought to remain more-or-less as they are. 
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I will say rather little about our paper establishing that there are arbi- 
trarily long arithmetic progressions of primes. Whilst there is considerable overlap 
between that paper and the ideas we discuss here, those methods were somewhat 
"soft" whereas the flavour of our more recent work is distinctly "hard" . We refer 
the reader to the survey of Tao in Volume I of these Proceedings, and also to the 
surveys [TU [Z7\ HE| . 

To conclude this introduction let me remark that the reader should not be under 
the impression that the Hardy-Littlewood method only applies to linear equations 
in primes, or even that this is the most popular application of the method. There 
has, for example, been a huge amount done on the circle of questions surrounding 
Waring's problem. For a survey see More generally there are many spectacular 
results where variants of the method are used to locate integer points on quite 
general varieties, provided of course that there are sufficiently many variables. 
The reader may consult Wooley's survey |34j for more information on this. 



2. The Hardy-Littlewood heuristic 

We have stated our interest in systems of linear equations in primes. While we 
are still somewhat lacking in theoretical results, there are heuristics which predict 
what answers we should expect in more-or-less any situation. 

It is natural, when working with primes, to introduce the von Mangoldt function 
A : N -> M^o, defined by 

. , . J logp if n = p k is a prime power 

[0 otherwise. 

The prime powers with k ^ 2 make a negligible contribution to any additive 
expression involving A. Thus, for example, the prime number theorem is equivalent 
to the statement that 

E n ^ N A(n) = l + o(l). 
Here we have used the very convenient notation of expectation from probability 
theory, setting E xe x '■= |AT| _1 J2xex f° r an y se * 

We now discuss a version of the Hardy-Littlewood heuristic for systems of linear 
equations in primes. Here, and for the rest of the article, we restrict attention to 
homogeneous systems for simplicity of exposition. 

Conjecture 2.1 (Hardy-Littlewood). Let A be a fixed s x t matrix with inte- 
ger entries and such that there is at least one non-zero solution to Ax = with 
x\, . . . , Xt ^ 0. Then 

E Bl ,...^<JvA(a;i) . . . A{x t ) = &{A){1 + o(l)) 

Ax=0 

as N — > oo, where the Singular Series &(A) is equal to a product of local factors 
Y\ p a p , where 

_ P(x e F£*|x g W P , Ax = 0) 
a " ~ P(x€F£'|xeF£) ' 
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The singular series reflects "local obstructions" to having solutions to Ax = 
in primes; in the simple example A = (l 9 —27), where the associated equation 
Pi + 9p2 — 27^3 = has no solutions, one has = 0. A more elegant formulation 
of the conjecture would include a "local obstruction at oo" a^, in exchange for 
removing the hypothesis on A. 

Chowla and van der Corput's results concerning three-term progressions of 
primes confirm the prediction Conjecture 12.11 for the matrix A = (l —2 l). 
From this it is easy to derive an asymptotic for the number of triples (pi,P2,P3), 
Pi < P2 < P3 *S N, of primes in arithmetic progression. 

Theorem 2.2 (Chowla, van der Corput, The number of triples of primes 

{pi,P2,P3), Pi < P2 < P3 ^ N, in arithmetic progression is 

& 3 N 2 log~ 3 iV(l + (l)), 

where 

The singular series S3 is equal to j&(A), where A = (l —2 1), and is also 
half the twin prime constant. 

Certain systems Ap = b should be thought of as very difficult indeed, since 
their understanding implies an understanding of a binary problem such as the 
Goldbach or twin prime problem. If A has the property that every non-zero vector 
in its row span (over Q) has at least three non-zero entries then there is no such 
reason to believe that it should be fantastically hard to solve. 

Definition 2.3 (Non-degenerate systems). Suppose that s,t are positive integers 
with t s + 2. We say that an s x t matrix A with integer entries is non- degenerate 
if it has rank s, and if every non-zero vector in its row span (over Q) has at least 
three non-zero entries. 

The reader may care to check that the system Q defining a progression of 
length k is non-degenerate. 

Our eventual goal is to prove Conjecture 12.11 for all non-degenerate systems. 
This goal may be subdivided into subgoals according to the value of s. 

Conjecture 2.4 (Asymptotics for s simultaneous equations). Fix a value of s ^ 1 
and suppose that t ^ s + 2 and that A is a non- degenerate s x t matrix. Then 
Conjecture \2. 1\ holds for the system Ap = 0. 

One can also formulate an appropriate conjecture for non-homogeneous systems 
Ap = b, and one would not expect to encounter significant extra difficulties in 
proving it. One might also try to count prime solutions to Ap = in which the 
primes pi are subject to different constraints pi ^ Ni, or perhaps are constrained 
to lie in a fixed arithmetic progression pi = eij (mod One would expect all of 
these extensions to be relatively straightforward. 

The classical Hardy-Littlewood method can handle the case s = 1 of Conjecture 
12.41 Our new developments have led to a solution of the case s = 2. In particular 
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we can obtain an asymptotic for the number of 4-term arithmetic progressions of 
primes, all less than N: 

Theorem 2.5 (G.-Tao The number of quadruples of primes {pi,P2,P3,P&), 

Pi < Pi < P3 < Pi ^ N , in arithmetic progression is 

& 4 N 2 log~ 4 iV(l + o(l)), 

where 

4 pi (p- 1 ) 



3. The Hardy-Littlewood method for primes 

The aim of this section is to describe the Hardy-Littlewood method as it would 
normally be applied to linear equations in primes. We will sketch the proof of 
Theorem 12.21 the asymptotic for the number of 3-term progressions of primes. 
This is equivalent to the s = 1 case of Conjecture 12.41 for the specific matrix 
A = (l —2 1J. Very similar means may be used to handle the general case 
s = 1 of that conjecture. 

The Hardy-Littlewood method is, first and foremost, a method of harmonic 
analysis. The primes are studied by introducing the exponential sum (a kind of 
Fourier transform) 

S(8) := E n<N A(n)e(6n) 

for 9 € R/Z, where e(a) :— e 2lxla . It is the appearance of the circle R/Z here 
which gives the Hardy-Littlewood method its alternative name. Now it is easy to 
check that 

l^ Bl ,x a ,x,<ArA(xi)A(x2)A(x3)lx 1 -2x a +x,=o= / S{6) 2 S{-26)d8. 

Jo 

whence 

E XuX2 , X3 ^n A(x 1 )A{x 2 )A{x 3 ) = (2JV + 0(1)) / S(9) 2 S(-20)dB. (3) 

x 1 —2x 2 +x 3 =0 JO 

The method consists of gathering information about S(9), and then using this 
formula to infer an asymptotic for the left-hand side. 

The process of gathering information about S(9) leads us to another key feature 
of the Hardy-Littlewood method: the realisation that one must split the set of 9 
into two classes, the major arcs 9JI in which 9 w a/q for some small q and the 
minor arcs m := [0, 1) \ To see why, let us attempt some simple evaluations. 
First of all we note that 



5(0) :=E n<N A(n) = l + o(l), 
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this being equivalent to the prime number theorem. To evaluate 5(1/2), observe 
that almost all of the support of A is on odd numbers n, for which e(n/2) = — 1. 
Thus 

5(1/2) := E n ^ N A(n)e(n/2) = -1 + o(l). 

The evaluation of 5(1/3) is a little more subtle. Most of the support of A is on n 
not divisible by 3, and for those n the character e(n/3) takes two values according 
as n = 1 (mod 3) or n = 2 (mod 3). We have 

5(1/3) = e(l/3)E„^ ln=l (mod 

3 )A(n) + e(2/3)E„^Arl„= 2 ( mod 3)A(n) + o(l) 

= -l/2 + o(l), 

this being a consequence of the fact that the primes are asymptotically equally 
divided between the congruence classes 1 (mod 3) and 2 (mod 3). 

In similar fashion one can get an estimate for S(a/q) for small q, and indeed 
for S(a/q + n) for sufficiently small 77, if one uses the prime number theorem in 
arithmetic progressions. The set of such 9 is called the major arcs and is denoted 
971. (The notion of "small q" might be q ^ log" 4 N, for some fixed A. The notion 
of "small 77" might be | ^7 1 ^ log" 4 N/qN. The flexibility allowed here depends on 
what type of prime number theorem along arithmetic progressions one is assuming. 
Unconditionally, the best such theorem is due to Siegel and Walfisz and it is this 
theorem which leads to these bounds on q and |»7|.) 

Suppose by contrast that 9 971, that is to say 9 is not close to a/q with q 
small. We say that 9 € m, the minor arcs. It is hard to imagine that in the sum 

S{V2 - 1) = E n<JV A(n)e(m/2) (4) 

the phases e(n\2) could conspire with A(n) to prevent cancellation. It turns out 
that indeed there is substantial cancellation in this sum. This was first proved 
by Vinogradov, and nowadays it is most readily established using an identity of 
Vaughan [2D], which allows one to decompose (J3J into three further sums which 
are amenable to estimation. We will discuss a variant of this method in fJ3 For the 
particular value 9 = y/2— 1 , and for other highly irrational values, one can obtain an 
estimate of the shape \S(9)\ <C N~ c for some c > 0, which is quite remarkable since 
applying the best-known error term in the prime number theorem only allows one 
to estimate 5(0) with the much larger error 0(exp(-C c log 3/5 " e iV)). By defining 
parameters suitably (that is by taking a suitable value of the constant A in the 
precise definition of 971), one can arrange that S(9) is always very small indeed on 
the minor arcs m, say 

sup|5(0)| <log- 10 iV. (5) 

Bern 

Recall now the formula ©. Splitting the integral into that over 971 and that 
over m, we see from Parseval's identity that 

I / S{9) 2 S(-29) d9\ < sup \S{9)\ C \S(9)\ 2 d6 « l ° S * N . (6) 

Jm Sem Jo 
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Thus in the effort to establish Theorcm l2.2l thc contribution from the minor arcs m 
may essentially be ignored. The proof of that theorem is now reduced to showing 
that 

/ s(9) 2 s(-2e)de = (i + o(i))^l[(i- r ^—^). 

Jm N - !) 

Since one has asymptotic formulae for S(ff) (and S(— 29)) on 9JI, this is essentially 
just a computation, albeit not a particularly straightforward one. 

It is instructive to look for the point in the above argument where we used the 
fact that A was non-degenerate, that is to say that our problem had at least three 
variables. Why can we not use the same ideas to solve the twin prime or Goldbach 
problems? The answer lies in the bound © . In the twin prime problem we would 
be looking to bound 

I / \S(d)\ 2 e(29)\, 

J0£m 

and the only obvious means of doing this is via an inequality of the form 

| f \S(9)\ 2 e(29)\ < sup \S(9)\ C f \S(6)\ 2 - C dO. 
Jeem 0em Jo 

Now, however, Parseval's identity does not permit one to place a bound on 

[ \S(6)\ 2 - C d6. 
Jo 

Indeed this whole endeavour is rather futile since heuristics predict that the minor 
arcs actually make a significant contribution to the asymptotic for twin primes. 

An attempt to count 4-term progressions in primes via the circle method is 
beset by difficulties of a similar kind. 



4. Exponential sums with Mobius 

The presentation in the next two sections (and in our papers) is influenced by that 
in the beautiful book of Iwaniec and Kowalski [23 ■ 

In the previous section we described what is more-or-less the standard approach 
to solving linear equations in primes using the Hardy-Littlewood method. In |21l 
Ch. 19] one may find a very elegant variant in which the Mobius function fi is 
made to play a prominent role. As we saw above the behaviour of the exponential 
sum S(9) was a little complicated to describe, depending as it does on how close 
to a rational 9 is. By contrast the exponential sum 

M(9) := E n<N fi(n)e(6n) 

has a very simple behaviour, as the following result of Davenport shows. 
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Proposition 4.1 (Davenport's Bound). We have the estimate 

\M{9)\ « A log~ A iV 
uniformly in 9 £ [0, 1) for any A > 0. 

In fact on the GRH Baker and Harman pQ obtain the superior bound |M(0)| <C 
^y-3/4+e gy ana l gy with results of Salem and Zygmund concerning random 
trigonometric series one might guess that the truth is that sup 0e j O l) 1-^(^)1 ~ 
cy^logN/N. This is far from known even on GRH; so far as I am aware no lower 
bound of the form sup eg r ^ \M(9)\\fN — ► oo is known. 

Although Davenport's result is easy to describe its proof has the same ingredi- 
ents as used in the analysis of S(9). One must again divide R/Z into major and 
minor arcs. On the major arcs one must once more use information equivalent to 
a prime number theorem along arithmetic progressions, that is to say information 
on the zeros of L-functions L(s,x) close to the line Sfts = 1. On the minor arcs 
one uses an appropriate version of Vaughan's identity. One of the attractions of 
working with Mobius is that this identity takes a particularly simple form (see [211 
Ch. 13] or [TU). 

We offer a rough sketch of how Proposition ^. II may be used as the main ingre- 
dient in a proof of Theorem l2.2l referring the reader to |^ Ch. 19] for the details. 
The key point is that one has the identity 

A(n)=5>(d)log(nA0. 

d\n 

One splits the sum over d into the ranges d ^ TV 1 / 10 and d > N 1 / 10 (say), obtaining 
a decomposition A = A' + A b . One has 

lo & d V v{k)e{9kd), 

from which it follows easily using Davenport's bound that 

S\9) <^ A log' A N (7) 

uniformly in 9 G [0, 1). 

One may then write the expression 

as a sum of eight terms using the splitting A = A" + A . The basic idea is now 
that the main term J\ ot p in Theorem 12.21 comes from the term with three copies 
of A", whilst the other 7 terms (each of which contains at least one A b ) provide a 
negligible contribution in view of @ and simple variants of the formula • 

We have extolled the virtues of the Mobius function by pointing to the aesthetic 
qualities of Davenport's bound. A more persuasive argument for focussing on it is 
the following basic metaprinciple of analytic number theory: 
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Principle (Mobius randomness law) . The Mobius function is highly orthogonal to 
any "reasonable" bounded function f : N — ► C. That is to say 

E n<N fi(n)f(n) = o(l), 

and usually one would in fact expect 

E n<ivM (n)/(n) « N- 1 ' 2 ^. (8) 

In the category "reasonable" in this context one would certainly include poly- 
nomials phases and other somewhat continuous objects, but one should exclude 
functions / which are closely related to the primes (/ = fx and / = A, for example, 
are clearly not orthogonal to Mobius). 

At a finer level than is relevant to our work, the Mobius randomness law is more 
reliable than other heuristics that one might formulate, for example concerning A. 
In [22] it is shown that 

E n ^ N A(n)X(n)e(-2^/n) ~ cA^ 1 / 4 , 

where X(n) := n^ 11 / 2 r(?i) is a normalised version of Ramanujan's r-function. One 
could hardly called naive for expecting square root cancellation here. 



5. Proving the Mobius randomness law 

In the last section we mentioned a principle, the Mobius randomness law, which 
is very useful as a guiding principle in analytic number theory. Unfortunately it 
is not possible to prove the strong version (JSJ) of the principle in any case - even 
when f(n) = 1 it is equivalent to the Riemann hypothesis. 

It is, however, possible to prove weaker estimates of the form 

E n<N ^(n)f(n) « A log" A N, (9) 

for arbitrary A > 0, for a wide variety of functions /. Davenport's bound is 
precisely this result when f(n) = e(0n) (and, furthermore, this result is uniform 
in 9). Similar statements are also known for polynomial phases and for Dirichlct 
characters (uniformly over all characters of a fixed conductor). 

Now when it comes to proving an estimate of the form one should think of 
there being two different classes of behaviour for /. In the first class are those / 
which are in a vague sense multiplicative, or linear combinations of a few multiplica- 
tive functions. Then the behaviour of E n <jAr/x(n)/(n) can be intimately connected 
with the zeros of L-functions. One has, for example, the formula 

00 1 
^2 fi(n)x(n)n~ f 



L(s, X ) 



for any fixed Dirichlet character \- By the standard contour integration technique 
(Perron's formula) of analytic number theory one sees that E„^Ar fi(n)x(n) is small 
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provided that L(s, \) does not have zeros close to 3?s = 1. (In fact, as reported on 
PTl p. 124], there are complications caused by possible multiple zeros of L, and it 
is better to work first with the sum E n <jjvA(n)x(n) 01 X over primes.) 

The need to consider zeros of L-functions can also be felt when considering 
additive characters e(an/q), for relatively small q. Indeed any Dirichlet character 
to the modulus q may be expressed as a linear combination of such characters. 
Conversely any additive character e(an/q) may be written as a linear combination 
of Dirichlet characters to moduli dividing q by using Gauss sums. By applying 
Siegel's theorem, which gives the best unconditional information concerning the 
location of zeros of L(s, x) near to 5Rs = 1, one obtains for any A the estimate 

E n ^ N fi(n)e(an/q) <a log~ A N, 

uniformly for q ^ log" 4 N. By partial summation the same estimate holds when 
a/q is replaced by 9 = a/q + rj for suitably small 77, that is to say for all 9 which 
lie in the set 071 of major arcs. 

We turn now to a completely different technique for bounding E„^jv/i(n)/(n). 
Remarkably this is at its most effective when the previous technique fails, that is 
to say when / is somehow far from multiplicative. 

Proposition 5.1 (Type I and II sums control sums with Mobius). Let f : N — > C 
be a function with |j/||oo ^ 1, and suppose that the following two estimates hold. 

1. (Type I sums are small) For all D ^ N 2 / 3 , and for all sequences {ddf^D 
with ||a||jr2r iD)2 £)) = 1, we have 

2D 

I E E ^f(wd)\« A N(logN)- A -\ (10) 

d=D l^w<N/d 

2. (Type II sums are small) For all D,W, N 1 / 3 < D s: N 2 / 3 , N 1 ' 3 ^ W < 
N/D and all choices of complex sequences (ad)^=D' Q ) w)'^Lw w ^ ll a lh 2 [£>.2_D) 
= l|fr||i 2 [w,2WO = 1j we have 

2D 

I Yj E a d b w f(wd)\ <: A N (log N)- a - 5 . (11) 

d=D W^w^2W 

Then 

E n<Nf i(n)f(n) «U log~ A N. (12) 

The reader may find a proof of this statement in ^] Ch. 6]. It is proved 
by decomposing the Mobius function into two parts using an identity of Vaughan 
|30| . When one multiplies by f(n) and sums, one of these parts leads to Type I 
sums and the other to Type II sums. Note that there is considerable flexibility in 
arranging the ranges of D in which Type I and II estimates are required, but it is 
not important to have such flexibility in our arguments. 
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The statement of Proposition 15.11 may look complicated. What has been 
achieved, however, is the elimination of /i. Strictly speaking, one actually only 
needs Type I and II estimates for some rather specific choices of coefficients a^, b w 
whose definition involves \x. The important realisation is that it is best to forget 
about the precise forms of these coefficients, the general expressions H10|) and (jl 1|> 
laying bare the important underlying information required of /. 

Note that if / is close to multiplicative then there is no hope of obtaining 
enough cancellation in Type II sums to make use of Proposition ^. II If / is actually 
completely multiplicative, for example, one may take a ( i = f(d) and b w = f(w) and 
there is manifestly no cancellation at all in l|ll|) . If this is not the case, however, 
then very often it is possible to verify the bounds l|10|) and (|llfl . An example of 
this is a linear phase e(9n) where 9 lies in the minor arcs m, that is to say 9 is 
not close to a/q with q small. By verifying these two estimates for such 9, one has 
from (|12|l that Davenport's bound holds when 9 £ m. This completes the proof 
of Davenport's bound, since the major arcs 971 have already been handled using 
L-function technology. 

To see how this is usually achieved in practice we refer the reader to Ch. 
24]. There the reader will see that a key device is the Cauchy-Schwarz inequality, 
which allows one to elimiate the arbitrary coefficients ad,b w . 

In 14 there is also a discussion of this result. Although logically equivalent, 
this discussion takes a point of view which turns out to be invaluable when dealing 
with more complicated situations. Taking f(n) = e(9n) in Proposition 15.11 we 
suppose that either ifTUI) or (fLl"]) does not hold, that is to say that either a Type I 
or a Type II sum is large. We then deduce that 9 must be close to a rational with 
small denominator, that is to say 9 must be major arc. This inverse approach 
to bounding sums with Mobius means that there is no need to make an a priori 
definition of what a "major" or "minor" object is. In situations to be discussed 
later this helps enormously. 



6. The insufficiency of harmonic analysis 

What did we mean when we stated that the Hardy-Littlewood method was a 
method of harmonic analysis? In Sjllwe saw that there is a formula, J2J, which 
expresses the number of 3-term progressions in a set (such as the primes) in terms of 
the exponential sum over that set. The following proposition is an easy consequence 
of a slightly generalised version of that formula: 

Proposition 6.1. Suppose that /i,/2,/3 : [N] — ► [—1,1] are three functions and 
that 

\E x u x2,x 3 fl{Xi)f 2 (x 2 )f3{x 3 )\ ^ S. 
xi—2x 2 +x 3 =0 

Then for any i — 1,2,3 we have 



sup \E n ^ N fi(n)e(n9)\ > (1 + o(l))ty2. 
6>e[o,i) 



(13) 
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We think of this as a statement the effect that the linear exponentials e(nO) 
form a characteristic system for the linear equation x\ — 2x 2 + 2^3 = 0. It follows 
immediately from Proposition 16.11 and Davenport's bound that Mobius exhibits 
cancellation along 3-term APs, in the sense that 

E X!,x2,x 3 fj J (xi)ii(x 2 )^(x 3 ) <i log~ A N. 

Proposition 16. II is also useful for counting progressions in sets A C [N], in which 
context one would take various of the fa to equal the balanced function f A '■= \A~ct 
of A, where a := \A\/N. It is easy to deduce from Proposition 16.11 the following 
variant, which covers this situation. 

Proposition 6.2. Suppose that A C [TV] is a set with \A\ — aN and that 
\E x u x 2 ,x 3 l A (x 1 )l A (x 2 )l A (x 3 ) - a 3 \ ^ S. 

X\ —2x2+13=0 

Write 

Ia '•= Ia — ol 
for the balanced function of A. Then we have 

sup \E n<N f A (n)e(ne)\ > (1 + o{l))S/U. (14) 

fl€[0,l) 

If a function / correlates with a linear exponential as in (|13fl or (|14|) then we 
sometimes say that / has linear bias. 

In this section we give examples which show that the linear exponentials do not 
form a characteristic system for the pair of equations x± — 2x2+x 3 = X2 — 2X3+X4 = 
defining a four-term progression. These examples show, in a strong sense, that 
the Hardy-Littlewood method in its traditional form cannot be used to study 
4-term progressions. An interesting feature of these two examples is that they 
were both essentially discovered by Furstenberg and Weiss [0] in the context of 
ergodic theory. Much of our work is paralleled in, and in fact motivated by, the 
work of the ergodic theory community. See the lecture by Tao in Volume 1 of 
these proceedings, or the elegant surveys of Kra [231 02] f° r more discussion and 
references. The examples were rediscovered, in the finite setting, by Gowers [SI HOj 
in his work on Szemeredi's theorem. 

Example 6.1 (Quadratic and generalised quadratic behaviour). Let a > be a 
small, fixed, real number, and define the following sets. Let A\ be defined by 

A 1 := {x G [N] : {x 2 V2} G [-a/2, a/2}} 

(here, {t} denotes the fractional part oft, and lies in (—1/2, 1/2]). Define also 

A 2 := {x G [N] : {xV2{xVH}} G [-a/2, a/2]}. 
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Now it can be shown (not altogether straightforwardly) that \A\\, \A 2 \ ~ aN, 
and furthermore that 

sup \E n<N f Ai e(n0)\ «iV- c 
9e[o,i) 

for i = 1,2. Thus neither of the sets Ay,A 2 has linear bias in a rather strong 
sense. If the analogue of Proposition 16.21 were true for four term progressions, 
then, one would expect both Ay and A2 to have approximately a 4 N 2 /6 four-term 
progressions. 

The set Ai, however, has considerably more 4-term APs that this in view of 
the identity 

x 2 - 3{x + d) 2 + 3{x + 2d) 3 -(x + 3d) 2 = 0. (15) 
This means that if x, x + d, x + 2d € Ay then 

{(a; + 3d) 2 V2} e [-7a/2, 7a/2], 

which would suggest that x + 3d G Ay with probability 3> 1. In fact one can show 
using harmonic analysis that l|15fl is the only relevant constraint in the sense that 

P(a; + 3de Ay\x,x + d,x + 2d e Ay) 

m ¥(yi- 3y 2 + 3y 3 e [-1, Stt, 2/3 e [-1,1]) =8/27. 

The number of 3-term progressions in Ai is w a 3 N/4, and so it follows that the 
number of 4-term progressions in Ay is rs 2a 3 /27. 

The analysis of A2 is rather more complicated. However one may check that 
if |{aV3}|, \{dV3}\ < 1/10 and if \{yV2{yV3}}\ < a/10 for y = x,x + d,x + 2d, 
then x + 3d € A2. One can show that there are 3> a 3 N 2 choices of x, d satisfying 
these constraints, and hence once again A2 contains ^> a 3 N 2 4-term progressions. 



7. Generalised quadratic obstructions 

We saw in the last section that the set of linear exponentials e(8n) is not a charac- 
teristic system for 4-term progressions. There we saw examples involving quadrat- 
ics n 2 9 and generalised quadratics n^ijn^}, and these must clearly be addressed 
by any generalisation of Propositions 16.11 and 16.21 to 4-term APs. Somewhat re- 
markably, these quadratic and generalised quadratic examples are in a sense the 
only ones. 

Proposition 7.1. Suppose that fy, f2, f 3 , Ja '■ [N] — + [—1,1] are four functions and 
that 

\E X!,x2,x 3 ,x4 fi(x{) f 2(22) f 3(23) Ufa)] > &• (16) 

X\ — 2x2+:E3=0 
X2— 2aj 3 +a;4=0 

Then for any i = 1,2,3,4 there is a generalised quadratic polynomial 



= X! Prs{0 r n}{0 s n} + j r {0 r n}, 

r,s<Ci(<5) 



(17) 
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where /3 rs , j r ,O r G M, such that 

\E n ^ N fi(n)e(<p(n))\ > c 2 {6). 
We can take C\{8) ~ exp(S~ c ) and 02(6) ~ exp(— 
Note that 

On' = 1009N*{-^y 

and 

71 

8 in {8 2 n} = 10^— }{fl 2 n} 

for n ^ N, and so the phases which can be written in the form i|17fl do include all 
those which were discovered to be relevant in the preceding section. 

The proof of Proposition l7. H is given in j^J. It builds on earlier work of Gowers 
In ^31 ( see a l so |14p several results of a related nature are given, in which 
other characteristic systems for the equation x\ — 2x2 + ^3 = £2 — 2x3 + %i = 
are given. These systems all have a "quadratic" flavour. We will discuss the 
family of 2-step nilsequences, which is perhaps the most conceptually appealing, 
in Sj^J In mil we will mention the family of local quadratics, which are useful for 
computations involving the Mobius function. The only real merit of the generalised 
quadratic phases e(4>(n)) discussed above is that they are easy to describe from 
first principles. 



8. The Gowers norms and inverse theorems 

The proof of Proposition 17.11 is long and complicated: there does not seem to 
be anything so simple as Formula @ in the world of 4-term progressions. Very 
roughly speaking one assumes that i|16fl holds, and then one proceeds to place 
more and more structure on each function fi until eventually one establishes that 
/, correlates with a generalised quadratic phase. There is a finite field setting for 
this argument, and we would recommend that the interested reader read this first: 
it may be found in ^3 Ch. 5]. The ICM lecture of Gowers [5] is a fine introduction 
to the ideas in his paper [SJ, which is the foundation of our work. 

There is only one part of the existing theory which we feel sure will play some 
role in future incarnations of these methods. This is the first step in the long series 
of deductions from l|l(j|) . in which one shows that each /j has large Gowers norm. 
For the purposes of this exposition 1 we define the Gowers [/ 2 -norm ||/||j/ 2 01 a 
function / : [N] -> [-1,1] by 

ll/llt/ 2 : = ^x 00 ,xoi,x 10 ,x 11 ^N.f(x 00 )f(x 01 )f{x w )f{x 1 i) 1 
3:00+3:11 =3:01+2:10 

1 In practice we do all our work the group Z/N'Z for some prime N' N with AT' f» M(A.)N, 
where M(A) is some constant depending on the system of equations Ax = one is interested 
in. One advantage of this is that the number of solutions to Ax = in Z/JV'Z is much easier to 
count than the number of solutions in [N] . The Gowers norms defined here differ from the Gowers 
norms in those settings by constant factors, so for expository purposes they may be thought of 
as the same. In the group setting the constant ca in ProDosition l8.l| is simply 1. 
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which is a sort of average of / over two dimensional parallelograms. The U k 
norm, k ^ 3, is an average of / over fc-dimensional parallelepipeds. Written down 
formally it looks much more complicated than it is: 

Il/H^fc :=E *o,. ...o, /(afo 1 ...,o)---/(a!i,...,i)j 

+^(2) = x ^(3) + x ul {i) 

where there are 2 k variables x u , to = (wi, . . . , wjt) € {0, l} fc , and the constraints 
range over all quadruples (w'V^^'V 4 ') *= ({0, l} fe ) 4 with + w (2) = 

The Gowers U norm governs the behaviour of any non-degenerate system 
Ax = in which A has (k — 1) rows. 

Proposition 8.1 (Generalised von Neumann theorem). Suppose that A is a non- 
degenerate s xt matrix with integer entries. Suppose that fx, ■ ■ ■ , ft ■ [N] — ► [—1,1] 
are functions and that 

|Exi,...,x t /i(a;i) . . . f t {x t )\ > S. 

Ax=0 

Then for each i — 1, . . . , t we have 

\\fi\\u*+ 1 > caS. 

The proof involves s + 1 applications of the Cauchy-Schwarz inequality. In 
this generality, the result was obtained in |14j , though the proof technique is the 
same as in ^U]. There are results in ergodic theory of the same general type, in 
which "non-conventional ergodic averages" are bounded using seminorms which 
are analogous to the [7 fc -norms: see |2()| . 

Taking s = k — 2 and A as in 0J, we see that in particular the Gowers U k ~ 1 - 
norm "controls" fc-term progressions. The Gowers norms are, of course, themselves 
defined by a system of linear equations, and so they must be studied as part of 
a generalised Hardy-Littlewood method with as broad a scope as we would like. 
The Generalised von Neumann Theorem may be regarded as a statement to the 
effect that in a sense they represent the only systems of equations that need to be 
studied. 

The Gowers norms do not feature in the classical Hardy-Littlewood method. 
It is, however, possible to prove a somewhat weaker version of Proposition 16 . II by 
combining the case k — 3 of Proposition 18 . II with the following inverse theorem: 

Proposition 8.2 (Inverse theorem for U 2 ). Suppose that N is large and that 
f : [N] — > [—1,1] is a function with \\f\\u 2 Then we have 

sup \E ni : N f(n)e(n9)\ > 25 2 . 
ee[o,i) 

To prove this we note the formula 

E a!oo ,s 01 ,a; 1 o,a!u/(^0o)/(»0l)/(ario)/(aill)l»oo+aiii=a;oi+3;io = / \f(®)\ d9, 

Jo 
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where f(0) := E n ^Ar/(n)e(n#). This implies that 

||/||^ = (3JV + 0(l))||/||t. 

In view of the fact that ||/||| ^ this and the assumption that ||/||j/2 S 

imply that 

\\f\\1>(3 + o(l))S 4 , 

which implies the result. 

This argument should be compared to the argument in ijfSjl. to which it corre- 
sponds rather closely. 

To deduce ProDOsition lfi.ll bv passing through Proposition l8.2l is rather perverse, 
since the derivation is longer than the one that proceeds via an analogue of JSJ) 
and it leads to worse dependencies. With our current technology, however, this is 
the only method which is amenable to generalisation. 

Similarly, one may deduce Proposition l7.1l from Proposition ^. II and the follow- 
ing result. 

Proposition 8.3 (Inverse theorem for the [/ 3 -norm). Suppose that f : [N] — > K 
is a function for which ||/||oo *S 1 an d II /lit/ 3 ^ <5- Then there is a generalised 
quadratic phase 

4>{n)= Pr S {0 r n}{9 s n} + lr {e r n}, (18) 

r,s<Ci(5) 

where (3 rs ,^ ri 9 r G M, such that 

\E n<N f(n)e(cj ) (n))\^c 2 (S). 
We can take Ci(5) ~ exp(S~ c ) and €2(6) ~ exp(— S~ c ). 

This result (and variations of it involving other "quadratic families" ) is in fact 
the main theorem in |lri| . 

As we mentioned, one may find a series of seminorms which are analogous to 
the Gowers norms in the ergodic-theoretic work of Host and Kra |20| • There are no 
such seminorms in the related work of Ziegler however, and this suggests that 
(as in the classical case) the Gowers norms may not be completely fundamental to 
a generalised Hardy-Littlewood method. 



9. Nilsequences 

In the previous section we introduced the Gowers [/ fe -norms, and stated inverse 
theorems for the U 2 - and U 3 - norms. These inverse theorems provide lists of 
rather algebraic functions which are characteristic for a given system of equations 
Ax = 0. Roughly speaking, the linear phases e(9n) are characteristic for single 
linear equations in which A is a 1 x t matrix. Generalised quadratic phases e{<p{n)) 
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are characteristic for pairs of linear equations in which A is a non-degenerate 2 x t 
matrix. 

These two results leave open the question of whether there is a similar list 
of functions which is characteristic for the £/ fc -norm, k 4 and hence, by the 
Generalised von Neumann Theorem, for non-degenerate systems defined by an 
s x t matrix with s > 3. The form of Propositions 18.21 and 18 . 31 does not suggest a 
particularly natural form for such a result, however, and indeed Proposition 18 . 21 is 
already rather unnatural-looking. 

To make more natural statements, we introduce a class of functions called 
nilsequences. 

Definition 9.1. Let G be a connected, simply connected, fc-step nilpotent Lie 
group. That is, the central series Go := G, G,;+i = [G, Gi] terminates with Gk = 
{e}. Let L C G be a discrete, cocompact subgroup. The quotient G/Y is then 
called a A:-step nilmanifold. The group G acts on G/Y via the map T g (xY) = xgY. 
If F : G/Y — > C is a bounded, Lipschitz function and x E G/Y then we refer to the 
sequence (F(T™ ■ x)) ne jq as a fc-step nilsequence. 

By analogy with the results of Host and Kra [201 m ergodic theory, we expect 
the collection of (k — l)-step nilsequences to be characteristic for the f/ fc -norm. 
The following conjecture is one of the guiding principles of the generalised Hardy- 
Littlewood method. 

Conjecture 9.2 (Inverse conjecture for t/ fc -norms). Suppose that k ^ 2 and that 
f : [N] — > [—1, 1] has \\f\\u k <5- Then there is a (k—l)-step nilmanifold G/Y with 
dimension at most Ci t ^{5), together with a function F : G/Y — > C with ||-F||oo ^ 1 
and Lipschitz constant at most C2,k($) and elements g G G, x € G/Y such that 

\E n<N f(n)F(T£ -x)\ ^c 3 , fe (<5). (19) 

We can at least be sure that Conjecture 19.21 is no more complicated than nec- 
essary, since in [HQ Ch. 12] we showed that if a bounded function / correlates 
with a (k — l)-step nilsequence as in ()19f) then / does have large Gowers [/^-norm. 
This, incidentally, is another reason to believe that the Gowers norms play a fun- 
damental role in the theory. It is not the case that correlation of a function / 
with a (A: — l)-step nilsequence prohibits / from enjoying cancellation along /c-term 
arithmetic progressions, for example. In the case k = 3 an example of this phe- 
nomenon is given by the function / which equals a for 1 n N/3 and —1 for 
N/3 < n ^ N, where a is the root between 1 and 2 of a 3 — a 2 + 3a — 4 = 0. This 
/ correlates with the constant nilsequence 1 yet exhibits cancellation along 3-term 
progressions, as the reader may care to check. 

Coni ecture 19 . 21 seems . at first sight, to be completely unrelated to Propositions 
18.21 and 18.31 However after a moment's thought one realises that a linear phase 
e(8n) can be regarded as a 1-step nilsequence in which G = R,r = Z, g = 9 and 
x = 0. Thus Proposition 18.21 immediately implies the case k = 2 of Coniecture l9.2l 

The case k = 3 is proved in One first proves Proposition 18.31 and then 

one shows how any generalised quadratic phase e(4>(n)) may be approximated by a 
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2-step nilsequence. Let us discuss a simple example, the Heisenberg nilmanifold, to 
convince the reader that 2-step nilsequences can give rise to "generalised quadratic" 
behaviour. 

Example 9.1 (The Heisenberg nilmanifold). Consider 

/l M R\ 
G := 1 Ml: T := 





Then G/r is a 2-step nilmanifold. By using the identification 

(x,y,z) = 




we can identify G/T (as a set) with K 3 , quotiented out by the equivalence relations 

(x, y , z) ^ (x + a, y + b + cx, z + c) for all a, 6, c e Z. 

This can in turn be coordinatised by the cylinder (M/Z) 2 x [—1/2,1/2] with the 
identification (x,y,~ 1/2) ~ (x, x + y, 1/2). Let F : G/r — > C be a function. We 
may lift this to a function F : G — > C, defined by -F^y) := F(yF). In coordinates, 
this lift takes the form 

F(x, y, z) — F(x (mod 1), y — [z]x (mod 1), {z}) 

where [z] — z — {z} is the nearest integer to x. Let 




be an element of G. Then the shift T g : G — > G is given by 

T 9 (a;, y, z) = (x + a, y + /3 + 7a;, z + 7). 

A short induction confirms, for example, that 

T™(0, 0, 0) = (na, n/3 + \n(n + l)c*7, n-y). 

Therefore if F : G/T — > G/T is any Lipschitz function, written as a function 
F : (M/Z) 2 x [-1/2,1/2] -»■ C with F(-l/2,y,z) = F(l/2, y + z, z), then we have 

f(t; i (o,o,o)) 

= F(na (mod 1), n/3 + \n(n + 1)017 — [n7]na (mod 1), {^-7}). 

The term [wy] na which appears here certainly exhibits a sort of generalised 
quadratic behaviour. For a complete description of how an arbitrary generalised 
quadratic phase e((f>(n)) can be approximated by a two-step nilsequence, we refer 
the reader to [El Ch. 12]. 
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Let us conclude this section by stating, for the reader's convenience, a re- 
sult/conjecture which summarises much of our discussion so far in one place. 

Theorem 9.3 (G.-Tao |13|). We have the following two statements. 

(i) (Generalised von Neumann) Suppose that s, t are positive integers with s+2 ^ 
t. Suppose that A is non- degenerate sxt matrix with integer entries. Suppose 
that ft, . . . , ft : [N] — > [—1,1] are functions and that 

\Ex 1 ,..., Xt f 1 (x 1 )...f t (x t )\^6. (20) 

Ax=0 

Then for each i = 1, . . . , t we have 

||/i||£/«+ 1 > c aS- 

(ii) (Gowers inverse result: proved for k = 2, 3, conjectural for k ^ 4) Suppose 
that f : [N] — > [—1, 1] has \\f\\u k 5 s Then there is a (k — l)-step nilmanifold 
G/T with dimension at most C\^{5), together with a function F : G/T — > C 
with ||-F||oo 1 and Lipschitz constant at most C2,k(S) and elements g £ G. 
x G G/T such that 

\-E n ^ N f(n)F(T^-x)\^c 3 , k (S). (21) 

In particular when s = 1 or 2 and l|2U|) holds for some A and some 5 then for 
each i = 1, . . . , t there is a 2-step nilsequence (F(Tg ■ x))„ 6 n such that 

\E n ^ N fi(n)F(T£ ■ x)\ > c A {6). (22) 



10. Working with the primes 

Let us suppose that we wish to count four-term progressions in the primes. One 
might try to apply Theorem l9 . 3l with the functions fi equal to the balanced function 
of A, the set of primes p ^ N, and then hope to rule out a correlation such as l|21|) 
for some i5 = o(a') (here, of course, a « log -1 N by the prime number theorem). 
This would then lead to an asymptotic using various instances of (|20f> together 
with the triangle inequality. 

There are two reasons why this is a hopeless strategy. First of all, the primes 
do correlate with nilsequences. In fact since all primes other than 2 are odd it is 
easy to see that 

E n ^ N f A (n)e(n/2) sa -a. 

There is a way to circumvent this problem, which we call the VF-trick. The idea is 
that if W = 2 x 3 x •••x w(N) is the product of the first several primes, then for 
any b coprime to W the set 

Ab := {n ^ N : Wn + b is prime} 
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does not exhibit significant bias in progressions with common difference q ^ w(N). 
One can then count 4-term progressions in the primes by counting 4-term progres- 
sions in Aftj x . . . At 4 for each quadruple (&i, . . . , 64) S (Z/W / Z) x4 in arithmetic 
progression and adding. 

We refer to any set A^ as a set of 'W-tricked primes" . In practice one is only free 
to take w(N) ~ log log N, since one must be able to understand the distribution 
of primes in progressions with common difference W (note that even on GRH one 
could only take w(N) ~ clog-ZV). Even assuming we could obtain optimal results 
concerning the correlation of the VF-tricked primes with 2-step nilsequences, this 
information will be very weak indeed. 

This highlights a more serious problem with the suggested strategy. Suppose 
that A C [N] is a set of density a for which there is no obvious reason why A 
should have an unexpectedly large or small number of 4-term APs, that is to say 
for which we might hope to prove that 

E x 1 ,x 2 ,x 3 ,x i l J 4(xi)l,4(x2)lA(^3)lA(a;4) ~ a ^ ■ (23) 

Xl — 2X2+X3=0 
X 2 ~ 2£3+:r4 = 

For example, A might be the VF-tricked primes less than N, in which case a ~ 

We might prove ilL'-il) by writing 1^ = a + /a, expanding as the sum of sixteen 
terms, and showing that fifteen of these are o(a 4 ) by appealing to Theorem 19.31 
and ruling out a correlation with a 2-step nilsequence as in i|22|l . Unfortunately 
we will be operating with 5 = o(a 4 ) <C log _4+e N, and the dependence of ca(<5) 
on 5 is very weak, being of the form exp(— 8~ c ). Thus we are asking to rule out 
the possiblility that 

\E n ^ N f A {n)F(T^ ■ a:)| > exp(- log c N) 

for some potentially rather large C. This is a problem, since one would never 
expect more than square root cancellation in any such expression. In fact for the 
W^-tricked primes one only has a small amount (depending on w(N)) of potential 
cancellation to work with and to all intents and purposes one should not bank on 
having available any estimate stronger than 

E n<N f A (n)F(T£-x) = o(l). 

What one really needs is a version of Proposition 1161 which applies to functions 
which need not be bounded by 1. Then one could hope to work with the von 
Mangoldt function A instead of the far less natural characteristic function 1a, or 
more accurately with W-^-tricked variants of the von Mangoldt function such as 

A b Mn) ~ ^±A(Wn + b). 

Such a result is the main result of our forthcoming paper [15] ■ It would take us 
too far afield to say anything concerning its proof, other than that it uses one 
of the key tools from our paper on long progressions of primes, the "ergodic 
transference" technology of ^1 Chs. 6,7,8]. 
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Proposition 10.1 (Transference principle, ^Hl)- Suppose that v : [N] — * M + is a 
pseudorandom measure. Then 

(i) The generalised von Neumann theorem, Theorem 19.31 (i). continues to hold 
for functions fi, ■ ■ ■ , ft ■ [N] — > M + such that \fi(x)\ ^ 1 + v(x) pointwise 
(the value of ca may need to be reduced slightly). 

(ii) If the Gowers inverse conjecture, Theorem 19.31 (ii), holds for a given value 
of k then it continues to hold for a function f such that \f(x)\ $J 1 + v(x) 
pointwise. In particular such an extension of the Gowers inverse conjecture 
is true when k = 2, 3. 

The reader may consult |12l Ch. 3] for a definition of the term pseudorandom 
measure and a discussion concerning it. For the purposes of this article the reader 
can merely accept that there is such a notion, and furthermore that one may con- 
struct a pseudorandom measure v : [N] — > R + such that v + 1 dominates any fixed 
W^-tricked von Mangoldt function A^f,. The construction of v comes from sieve 
theoretic ideas originating with Selberg. The confirmation that v is pseudorandom 
is essentially due, in a very different context, to Goldston and Yildirim [7j. 

Applying these two results, one may see that the Hardy-Littlewood conjecture 
12.11 for a given non-degenerate s x t matrix A is a consequence of the Gowers 
inverse conjecture in the case k = s + 1 together with a bound of the form 

E n<N {A b:W - l)F(T g n • x) = o G/r>F {l) (24) 

for every s-step nilsequence (F(T™ ■ x))„ 6 n. 

By effecting a decomposition of K^w as A^ w + A^ w rather like that in 21 the 
proof of this statement may be further reduced to a similar result for the Mobius 
function: 

Conjecture 10.2 (Mobius and nilsequences). For all A > 0. We have the bound 

E nsSAr /x(n)F(T s n • x) «A,G/r,F log~ A 

for every k-step nilsequence (F(Tg ■ x)) n ^. 

Note that we require more cancellation (a power of a logarithm) here than in 
l|24|) . This is because in passing from /i to A^ w one loses a logarithm in performing 
partial summation as in the derivation of Q. The method we have in mind to 
prove Coniecture ll().2l however, is likely to give this strong cancellation at no extra 
cost. 

Conjecture 110. 21 posits a rather vast generalisation of Davenport's bound. The 
conjecture is, of course, highly plausible in view of the Mobius randomness law. 

Let us remark that the derivation of l|24|) from Conjecture 111). 21 is not at all 
immediate, since one must also handle the contribution from A^^. To do this 
one uses methods of classical analytic number theory rather similar to those of 
Goldston and Yildirim [7j. 
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11. Mobius and nilsequences 

The main result of [21 is a proof of Coniecture 110.21 in the case k = 2. This leads, 
by the reasoning outlined in the previous section, to a proof of Coniecture 12.41 in 
the case s = 2. 

We remarked that the classical Hardy-Littlewood method was a technique of 
harmonic analysis. We also highlighted the idea of dividing into major and minor 
arcs. We have said much on the subject of generalising the underlying harmonic 
analysis, but as yet there has been nothing said about a suitable extension of 
major and minor arcs. In this section we describe such an extension by making 
some remarks concerning the proof of the case k — 2 of Coniecture ll().2l 

In we discussed how bounds on Type I and II sums may be used to show 
that a given function / does not correlate with Mobius. Recalling our "inverse" 
strategy for proving Davenport's bound, one might be tempted to go straight into 
Proposition 15.11 with f(n) = F(T£ ■ x), a 2-step nilsequence, posit largeness of 
either a Type I or a Type II sum, and then use this to say that the nilsequence is 
somehow "major arc" . One might then hope to handle the major nilsequences by 
some other method, perhaps the theory of L- functions. 

Such an attempt is a little too simplistic, for the following reason. Returning to 
the Tstep case, note that the sum of two 1-step nilsequences is also a 1-step nilse- 
quence (on the product nilmanifold G\/Ti x G^/I^). In particular, the function 
f{n) = e(n/5) + e(n v / 2) is a 1-step nilsequence. We know, however, that to handle 
correlation of Mobius with e(n/5) we need to know something about L-functions, 
whereas we do not have an L- function method of handling e(n^/2). This suggests 
that some sort of preliminary decomposition of the function / is in order, and such 
a suggestion turns out to be correct. 

In the 2-step nilsequence F(T™-x) can be decomposed into local quadrat- 

ics. These are objects of the form 

/(»):= Wn)e(0(n)), (25) 

where B m is a set of the form 

B N := {n : N/2 < n < N : Fi(n) ^ 0} 

for some 1-step nilsequence F\ depending on F, G/T, g and x, and <j> : Bn — ► R/Z 
is locally quadratic. This means that one may unambiguously define the second 
derivative 4>"{h\,h2) to equal 

(p(x + hi + h 2 ) — 4>(x + hi) — (j)(x + h 2 ) + 4>(x) 

for any x such that x, x + hi, x + h%, x + hi + h% € B^. 

It turns out that for the purposes of analysing Type I and II sums the cutoff 
1 b n plays a subservient role. The phase 4>, on the other hand, is crucial. The bulk 
of |14| is devoted to showing that if either a Type I or a Type II sum involving 
some / as in (|25(l is large, then <f> is major arc. This is a direct analogue of the 
proof of Davenport's bound as phrased at the end of ^Sl (the "inverse" approach). 
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Roughly speaking, (j> is said to be major arc if q<j>"(h\, h 2 ) is small for some smallish 
q and all /ii,/i2, which in turn essentially means that <f> is slowly varying on Bn 
intersected with any fixed progression a (mod q). For a detailed discussion see 
[Tl) . Suffice it to say that the passage from large Type I/II sum to <j> being major 
arc is long and difficult, and requires many applications of the Cauchy-Schwarz 
inequality to manipulate the phase <j) into a helpful form, as well as basic tools of 
equidistribution such as a version of the Erdos-Turan inequality. 

Recalling Proposition 15.11 one has reduced the case s = 2 of Conjecture 110.21 
to the statement that 

for any major arc phase <j>. It turns out that lB N (n)e(4>(n)) can, in this case, be 
closely approximated by a sum of linear phases e(dn), and so we may conclude 
using Proposition ^. II 

Note that this analysis has the flavour of an induction on s, the step of the 
nilsequence we are considering. We expect to see this more clearly when addressing 
the general case of Conjecture 110.21 in future work. 



12. Future directions 

The most obvious avenue of research left open is to generalise everything we have 
done for s — 2 to the case s 3. In particular we would like inverse theorems for the 
£/ fc -norms for k ^ 4, and a proof of Conjecture 110.21 for s ^ 3. We are currently 
working towards this goal. We expect that the methods of Gowers ^U] can be 
adapted to achieve the inverse theorem, though this will not be straightforward. 
It is also very likely that the "inverse" approach to handling Type I and II sums 
can be adapted to the higher-step case of Coniecturc ll0.2l though again we do not 
expect this to be wholly straightforward. 

It would be very desirable to have good bounds for error terms such as the o(l) 
in Theorem 12.51 We are sure that our current estimate for the error in Theorem 
12. 51 is the worst that has ever featured in analytic number theory - the error term 
is a completely ineffective o(l)! Ultimately this is because to show that the error 
is less than 8 one finds oneself needing to rule out a real zero of some L(s,x), 
X a primitive quadratic character to the modulus q, with s > 1 — C'q~ e , where 
e = e(S) — ► as 6 — ► 0. Siegel's theorem states that for any e > there is such a 
C, but it is, of course, not possible to specify C effectively. 

It is clear that the spectre of ineffectivity does not rear its head under the 
assumption of GRH, and we believe that our methods lead to an error term of the 
form log _c N in Theorem 12. 51 

There are other, presumably more tractible, ways in which one might obtain an 
explicit error term. Improvements to the combinatorial tools used in |13j , particu- 
larly advances on the circle of conjectures known as the "polynomial Freiman-Ruzsa 
conjecture" , could be very helpful here. 
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We turn now to goals which lie further away. I have hinted at various places in 
this survey that the way in which we see nilsequences arising is very long-winded 
and, presumably, not the "right" way. The ergodic theorists [201 ESI do admittedly 
discover the role of these functions somewhat less painfully (albeit after setting up 
a good deal of notation). Nilsequences seem such natural objects, however, that 
there ought to be a much better way of appreciating their place in the study of 
systems of linear equations. Recalling that ||/||j/ 2 is essentially the L 4 norm of / 
one might even ask, for example, 

Question 12.1. Is there a usable "formula" relating |j/|j[/3 and certain of the 
"nil-fourier coefficients" E n ^ N f(n)F(T™ ■ x)l 

Such a formula would assuredly have to be very exotic on account of the vast 
profusion of nilsequences which might enter into consideration. The nilsequences 
are not naturally parametrised by anything so simple as the circle S , which gave 
its name to the classical circle method. 

Let us conclude with some speculations on non-linear systems of equations, 
where our knowledge is at present essentially non-existent. We have seen in Con- 
iecture l?T2"l that the behaviour of an any system Ax = b, where A is non-degenerate 
in the sense of Definition 12. 31 should be governed by a very "hard" or "algebraic" 
collection of characteristic junctions, in this case the nilsequences. 

On the other hand degenerate linear systems, such as x\ — x% = 1, do not have 
this property. To see this, suppose that N — 2m is even and let A C [TV] be 
a set formed by setting A n {2i,2i + 1} = {2i} or {2i + 1}, these choices being 
independent in i for i = 0, . . . , m — 1. Then \ A\ = N/2, and A is indistinguishable 
from a truly random set by taking inner products with any conceivable "hard" 
character such as a linear or quadratic phase. However, A is expected to have 
about TV/8 solutions to ii - 12 = 1, whereas a random set has about twice this 
many. 

One might call an equation or system of equations for which a "hard" char- 
acteristic system exists a mixing system. We do not have a precise definition of 
this notion. Some non-linear equations are known to be mixing - for example, the 
linear phases e(6n) form a characteristic system for the equation x\ + X2 — x\. 
Many more are not. It would be very interesting to know, for example, whether 
the equation x\X2 — = 1 is mixing and, if so, what a characteristic system 
for it might be. This seems to be a very difficult question as the analysis of this 
equation even in very specific situations involves deep methods from the theory of 
automorphic forms. 
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